--- title: "Seasonal Analysis with the TSstudio Package" author: "Rami Krispin (@Rami_Krispin)" date: "2018-09-17" output: html_document ---

Seasonality analysis

The TSstudio package provides a variety of functions for seasonality analysis, with the use of interactive data visualizations tools such as seasonal, heatmap, quantile, surface and polar plots, based on the plotly package engine. Most of those functions support the main time series classes (ts, xts and zoo objects, and as well data frame objects data.frame, data.table and tbl) with a frequency between daily and quarterly (with the exception of the quantile plot, which support half-hour and hourly frequencies).

# install.packages("TSstudio")
library(TSstudio)
packageVersion("TSstudio")
## [1] '0.1.2'

The USgas dataset

In the following examples, we will use the USgas dataset. This dataset is one of the TSstudio package datasets, and a good example of a time series with a strong seasonal pattern. This dataset represents the monthly natural gas consumption in the US since 2000:

# Load the US monthly natural gas consumption
library(TSstudio)
data("USgas")

ts_info(USgas)
##  The USgas series is a ts object with 1 variable and 222 observations
##  Frequency: 12 
##  Start time: 2000 1 
##  End time: 2018 6
ts_plot(USgas,
        title = "US Natural Gas Consumption",
        Xtitle = "Year",
        Ytitle = "Billion Cubic Feet"
        )

The ts_seasonal function

The ts_seasonal function was designed for plotting time series data by its full frequency cycle (hence, hence break the series by years for monthly data) and/or by its frequency units (e.g., by the months of the year for monthly data). The function supports ts, xts, zoo and data frame family objects with a frequency between daily to quarterly. The function has three modes (and a fourth one that includes all the three modes together):

  1. normal - type provides a break of the series by the cycle units (or years). This allows identifying whether the series has a reputable seasonal pattern:
ts_seasonal(USgas, type = "normal")

The colors are being set automatically by a sequential color palette in chronical order.

  1. cycle - this option provides a view of each frequency unit of the series across the full cycle units of the frequency, in the example below you can notice, for example, that the consumption during January in most of the years was the peak of the year:
ts_seasonal(USgas, type = "cycle")
  1. box - for representing the cycle units with a box plot:
ts_seasonal(USgas, type = "box")

Alternatively, setting the type = all, print the three options above together in one plot:

ts_seasonal(USgas, type = "all")

Note: that the colors of the months in the cycle plot corresponding to the colors in the box plot.

While the box mode provides a useful information about the distribution of each frequency units, it might be misleading as the series was not detrended. Similarly, the cycle represents the variation and trend, if exists, of each frequency unit over time, yet it is hard to identify the distribution. The all mode provides the full picture, as it is allowed to identify a seasonal pattern within the series without detrending it.

Colors setting

The colors of the plots can be modified to any of the RColorBrewer or viridis packages palettes options (see below the available palettes). The palette_normal argument set the colors of the normal mode. As mentioned above, colors set according to the chronological order of the lines. The palette argument set the colors of the both the cycle and box options. In the example below, the colors of the normal plot are set to inferno palette from the viridis package and the cycle and box plots are set to Accent palette from the RColorBrewer package.

ts_seasonal(USgas, type = "all", palette_normal = "inferno", palette = "Accent")

Below are the possible palettes of the RColorBrewer and viridis packages:

RColorBrewer::display.brewer.all() 

n_col <- 128

img <- function(obj, nam) {
  image(1:length(obj), 1, as.matrix(1:length(obj)), col=obj, 
        main = nam, ylab = "", xaxt = "n", yaxt = "n",  bty = "n")
}

par(mfrow=c(5, 1), mar=rep(1, 4))
img(rev(viridis::viridis(n_col)), "viridis")
img(rev(viridis::magma(n_col)), "magma")
img(rev(viridis::plasma(n_col)), "plasma")
img(rev(viridis::inferno(n_col)), "inferno")
img(rev(viridis::cividis(n_col)), "cividis")

The ts_heatmap function

Another useful visualization tool for seasonality analysis is the ts_heatmap function for time series objects, where the y axis represents the cycle units and x axis represents the years:

ts_heatmap(USgas)

Likewise the ts_seasonal function, the heatmap colors can be set by any of the palettes in the RColorBrewer and viridis packages with the color argument:

ts_heatmap(USgas, color = "Reds")

One of the main improvement in the current version is the ability to handle daily and weekly frequencies in addition to the monthly and quarterly. By default, daily data (whenever it has “Date” index) will appear in a format of weekday by year. For example, we will plot the daily demand for electricity in the UK (available in the UKgrid package):

#install.packages("UKgrid")
library(UKgrid)

UKgrid_daily <- extract_grid(type = "tbl", aggregate = "daily")
head(UKgrid_daily)
## # A tibble: 6 x 2
##   TIMESTAMP       ND
##   <date>       <int>
## 1 2011-01-01 1671744
## 2 2011-01-02 1760123
## 3 2011-01-03 1878748
## 4 2011-01-04 2076052
## 5 2011-01-05 2103866
## 6 2011-01-06 2135202
ts_heatmap(UKgrid_daily, color = "BuPu")

Quantile plot

Generally, as the frequency of the series is more granular (hence, the data was captured in higher frequency such as seconds, minutes, etc.), potentially the data would have different seasonality (or multi-seasonality) patterns. The ts_quantile function creates a quantile plots for time series data with the ability to use different time frequency subset. This function supports only time series objects with a Date or POSIX object as an index such as xts, zoo and data frame family (data.frame, data.table, and tbl) with a half-hourly, hourly, daily, monthly and quarterly frequencies.

A good example of a time series data with a multi-seasonality is the hourly demand for electricity which could have hourly (high consumption during the day lower during the night), weekly (high consumption during working days and lower during weekend and holidays), and monthly seasonality (consumption patterns depends on the region climeat across the year). To demonstrate the usages of this function we will load the UK national grid demand for electricity dataset again, however, this time will use an half-hour intervale format:

UKgrid_half_hour <- extract_grid(type = "xts", aggregate = NULL)
library(xts)
ts_info(UKgrid_half_hour)
##  The UKgrid_half_hour series is a xts object with 1 variable and 132912 observations
##  Frequency: 30 mins 
##  Start time: 2011-01-01 
##  End time: 2018-07-31 23:30:00
ts_plot(UKgrid_half_hour)

By default, the ts_quantile function will calculate and plot the 25th and 75th percentiles (the lower and upper lines) and the median (the solid line) by the series frequency units:

ts_quantile(UKgrid_half_hour, period = NULL, title = "The UK National Grid Net Demand for Electricity - Quantile Plot")

It is possible to modify the percentile range with the “lower” and “upper” argument. For instance, set the range between the 15th and 85th percentiles (the solid line remains the median):

ts_quantile(UKgrid_half_hour, 
            period = NULL, 
            lower = 0.15, 
            upper = 0.85,
            title = "The UK National Grid Net Demand for Electricity - 24 Hours Quantile Plot")

The period argument subset and plot the series quantiles by a specific frequency hierarchy. Hence, if the series frequency is a half-hour, we can subset and plot it by weekdays, months quarters and years. This allows checking if the series seasonal patterns vary on a different period’s subsets. For instance, when setting the period for weekdays the output is a 24-hour quantile (by half-hour intervals) for each day of the week:

ts_quantile(UKgrid_half_hour, 
            period = "weekdays",
            title = "The UK National Grid Net Demand for Electricity - 24 Hours Quantile Plot by Weekdays")

The n argument set the plot rows number. This allows spreading the plots when using a higher number of plots:

ts_quantile(UKgrid_half_hour, 
            period = "weekdays", 
            title = "The UK National Grid Net Demand for Electricity - 24 Hours Quantile Plot by Weekdays",
            n = 2)

Surface and polar plots

The ts_surface function provides a 3D representative for time series data

ts_surface(USgas)

The ts_polar function provides a polar plot demonstrative of time series data where the year is represented by color and the magnitude is represented by the size of the cycle unit layer:

ts_polar(USgas)

Correlation Analysis

The ts_lag, as the name implies, create a lag plot of time series data currently support only ts, xts, and zoo objects with monthly or quarterly frequency:

ts_lags(USgas)

By default, the function plots the first 12 lags, however you can set the number of lags by modifing the lags argument to either a sequence of lags (e.g., lags = 1:24 for the first 24 lags) or for a specifics lags. For example, you can plot the seasonal lags of the USgas dataset by selecting the 12, 24, 36 and 48 lags:

ts_lags(USgas, lags = c(12, 24, 36, 48))

The ts_acf and ts_pacf are wrappers for the stats package acf and pacf functions, providing a colorful and interactive version for those functions:

ts_acf(USgas, lag.max = 36)
ts_pacf(USgas, lag.max = 36)